Development of HMM-based acoustic laughter synthesis
نویسندگان
چکیده
Laughter is a key signal in human communication, conveying information about our emotional state but also providing social feedback to the conversational partners. With the development of more and more natural humancomputer interactions (with the help of embodied conversational agents, etc.), the need emerged to enable computers to understand and express emotions. In particular, to enhance human-computer interactions, talking machines should be able to laugh. Yet, compared to speech synthesis, acoustic laughter synthesis is an almost unexplored domain. Sundaram and Narayanan [5] modeled the laughter intensity rhythmic envelope with the equations governing an oscillating mass-spring and synthesized laughter vowels by Linear Prediction. This approach to laughter synthesis was interesting, but the produced laughs were judged as non-natural by listeners. Lasarcyk and Trouvain [3] compared laughs synthesized by an articulatory system (a 3D modeling of the vocal tract) and diphone concatenation. The articulatory system gave better results, but they were still evaluated as significantly less natural than human laughs. To improve laughter synthesis naturalness, we propose to use Hidden Markov Models (HMMs), which have proven efficient for speech synthesis. We opted for the HMM-based Speech Synthesis System (HTS) [4], as it is free and widely used in speech synthesis and research. The data used comes from the AVLaughterCycle database (AVLC), which contains around 1000 laughs from 24 subjects and includes phonetic transcriptions of the laughs [6]. HTS provides a demonstration canvas for speech synthesis, which enables to quickly obtain synthesis models with standard speech parameters. Our first works were to use this canvas to build a baseline for HMM-based laughter. Then, we looked at adapting our data and modifying some parts of the HTS demo to improve the quality of the obtained laughs. The major improvement of the AVLC database to better exploit the potential of HTS is the annotation of laughter “syllables”. This enables to include contextual parameters (e.g. the position of the “phoneme” within its “syllable”, the position of the current “syllable” within the current “word”, etc.) in the synthesis models.
منابع مشابه
Audio-visual Laughter Synthesis System
In this paper we propose an overview of a project aiming at building an audio-visual laughter synthesis system. The same approach is followed for acoustic and visual synthesis. First a database has been built to have synchronous audio and 3D visual landmarks tracking data. Then this data has been used to build HMM models of acoustic laughter and visual laughter separately. Visual laughter model...
متن کاملThe AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis
A synchronous database of acoustic and 3D facial marker data was built for audio-visual laughter synthesis. Since the aim is to use this database for HMM-based modeling and synthesis, the amount of collected data from one given subject had to be maximized. The corpus contains 251 utterances of laughter from one male participant. Laughter was elicited with the help of humorous videos. The result...
متن کاملUpper Body Animation Synthesis for a Laughing Character
Laughter is an important social signal in human communication. This paper proposes a statistical framework for generating laughter upper body animations. These animations are driven by two types of input signals, namely the acoustic segmentation of laughter as pseudophoneme sequence and acoustic features. During the training step, our statistical framework learns the relationship between the la...
متن کاملLaughter animation synthesis
Laughter is an important communicative signal in humanhuman communication. However, very few attempts have been made to model laughter animation synthesis for virtual characters. This paper reports our work to model hilarious laughter. We have developed a generator for face and body motions that takes as input the sequence of pseudophonemes of laughter and each pseudo-phoneme’s duration time. L...
متن کاملAutomatic acoustic synthesis of human-like laughter.
A technique to synthesize laughter based on time-domain behavior of real instances of human laughter is presented. In the speech synthesis community, interest in improving the expressive quality of synthetic speech has grown considerably. While the focus has been on the linguistic aspects, such as precise control of speech intonation to achieve desired expressiveness, inclusion of nonlinguistic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012